Two-level Annotation of Utterance-units in Japanese Dialogs: An Empirically Emerged Scheme
نویسندگان
چکیده
In this paper, we propose a scheme for annotating utterance-level units in Japanese dialogs, which emerged from an analysis of the interrelationship among four schemes, i) inter-pausal units, ii) intonation units, iii) clause units, and iv) pragmatic units. The associations among the labels of these four units were illustrated by multiple correspondence analysis and hierarchical cluster analysis. Based on these results, we prescribe utterance-unit identification rules, which identify two sorts of utterance-units with different granularities: short and long utterance-units. Short utterance-units are identified by acoustic and prosodic disjuncture, and they are considered to constitute units of speaker’s planning and hearer’s understanding. Long utterance-units, on the other hand, are recognized by syntactic and pragmatic disjuncture, and they are regarded as units of interaction. We explore some characteristics of these utterance-units, focusing particularly on unit duration and syntactic property, other participants’ responses, and mismatch between the two-levels. We also discuss how our two-level utterance-units are useful in analyzing cognitive and communicative aspects of spoken dialogs.
منابع مشابه
An annotation scheme for syntactic unit in Japanese dialog
In this paper, we propose a scheme for annotating syntactic units called DCU (Dialog Clause-Unit) in Japanese dialogs. Since there is no explicit devices to mark sentence boundaries in speech, precise definition and criteria must be designed to extract syntactic units from the utterance. We show a design of DCU which consists of clausal and non-clausal units. Annotating DCU tags to eight dialog...
متن کاملCoding Dialogs with the DAMSL Annotation Scheme
This paper describes the DAMSL annotation scheme for communicative acts in dialog The scheme has three layers Forward Communicative Functions Backward Communicative Functions and Utterance Features Each layer allows multiple communicative functions of an utterance to be labeled The Forward Communicative Functions consist of a taxonomy in a similar style as the actions of traditional speech act ...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملA semantic representation for spoken dialogs
This paper describes a semantic annotation scheme for spoken dialog corpora. Manual semantic annotation of large corpora is tedious, expensive, and subject to inconsistencies. Consistency is a necessity to increase the usefulness of corpus for developing and evaluating spoken understanding models and for linguistics studies. A semantic representation, which is based on a concept dictionary defi...
متن کاملAdding Syntactic Annotations to Transcripts of Parent-Child Dialogs
We describe an annotation scheme for syntactic information in the CHILDES database (MacWhinney, 2000), which contains several megabytes of transcribed dialogs between parents and children. The annotation scheme is based on grammatical relations (GRs) that are composed of bilexical dependencies (between a head and a dependent) labeled with the name of the relation involving the two words (such a...
متن کامل